RDLL at CrossLink Anchor Extraction Considering Ambiguity in CLLD

نویسندگان

  • Fuminori Kimura
  • Kensuke Horita
  • Yuuki Konishi
  • Hisato Harada
  • Akira Maeda
چکیده

In this paper, we describe our work in NTCIR-10 on the task of cross-lingual link discovery (CLLD). Our proposed method is focused mainly on two aspects in order to accomplish this task: how to find important anchors from an original article in order to crosslink and how to find the correct links to articles in the target language for the original articles. The system first uses online data collected from Japanese Wikipedia articles in order to build a basic crosslink database. These data will be applied in order to identify the anchors and find out the relevant corresponding English articles. We carried out this task in three steps. First, we parsed the Japanese articles and extracted the candidate anchors. Second, we ranked anchors on the basis of the weights of their importance. Third, we determined the correct English articles for each anchor. We marked LMAP 0.151 with manual assessment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DCU at NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) Task

DCU participated in the English to Chinese (C2E) and Chinese to English (C2E) subtasks of the NTCIR 10 CrossLink2 Cross-lingual Link Discovery (CLLD) task. Our strategy for each query involved extracting potential link anchors as n-gram strings, cleaning of potential anchor strings, and anchor expansion and ranking to select a set of anchors for the query. Potential anchors were translated usin...

متن کامل

UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery

This paper describes UKP’s participation in the cross-lingual link discovery (CLLD) task at NTCIR-9. The given task is to find valid anchor texts from a new English Wikipedia page and retrieve the corresponding target Wiki pages in Chinese, Japanese, and Korean languages. We have developed a CLLD framework consisting of anchor selection, anchor ranking, anchor translation, and target discovery ...

متن کامل

IISR Crosslink Approach at NTCIR 9 CLLD Task

In this paper, we describe our approach to the English-Korean Cross-Lingual Link Discovery (CLLD) task in NTCIR 9. We propose a simple and effective approach to discover the links. Our method comprises preprocessing steps, anchor-target link mapping, and the ranking steps. For discovering the links, we use the English anchor names, the inter-language links, and the translation by the Google Tra...

متن کامل

WUST EN-CS Crosslink System at NTCIR-9 CLLD Task

This paper describes our work in NTCIR-9 on the task of Cross-Lingual Link Discovery (Crosslink/CLLD). The work mainly focuses on two aspects to accomplish this task: (1) How to collect useful data for Crosslink and (2) How to use the data correctly and effectively. The system firstly uses online data collecting and text mining in Chinese Wikipedia articles to build the basic Crosslink database...

متن کامل

NTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features

This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013